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The possibility of strong biases in a multicomponent Maximum Likelihood fits with component-dependent templates 
has been demonstrated in some toy problems. We discuss here in detail a problem of practical interest, particle 
identification based on time-of- flight or dE/dx information. We show that large biases can occur in estimating 
particle fractions in a sample if differences between the momentum spectra of particles are ignored, and we present a 
more robust fit technique, allowing bias-free estimation even when the particle spectra in the sample are unknown. 



1. Introduction 

It has been shown in some toy problems 1 that strong 
biases may occur in a multicomponent Maximum 
Likelihood fit whenever the templates, i.e. the func- 
tions, used to parameterize the probability distribu- 
tions used in the fit are not fixed but depend on 
event observables. An interesting example of such 
a problem in the practice of experimental High En- 
ergy Physics is the statistical separation of different 
kinds of particles on the basis of limited-precision 
measurements of particle-dependent quantities, like 
Time-of-Flight or energy loss (dE/dx). 

2. Particle Fractions estimation 

Consider a sample of particles generated by a cer- 
tain physical process in our experiment. We know 
that the given sample is a mixture of known particle 
types, for example Pions, Kaons and Protons, but 
unfortunately we don't know the fractions of each 
type, respectively indicated by J^, fx, fp- Let's 
assume that our experimental apparatus includes a 
Particle Identification (PID) device, providing the 
measurement of some quantities whose distribution 
depends on the particle type. Using this PID infor- 
mation we want to estimate f^, Jk and fp, by means 
of an Unbinned Maximum Likelihood fit of our data 
sample. 

The above problem is very common in particle 
physics, for example it occurs in separating different 
decay modes of a given particle 3 (same final state 
multiplicity and topology but different final state 



particle types), in studies of fragmentation of heavy 
quarks 2 , or in optimizing the performances of algo- 
rithms for tagging the flavor of B mesons 2 . 

We will consider two common methods for par- 
ticle identification: one is based on the measurement 
of energy loss of charged particles due to the ion- 
ization of a gas or of a semiconductor (often the 
same device used to measure particle momentum), 
the so called dE/dx measurement; the other is based 
on the measurement of the Time-of-Flight (TOF) 
of the particle. A common feature of PID devices 
based on the above principles is that the separation 
power between different particles is not a constant, 
but strongly depends on the momentum of the given, 
unknown, particle. A clear example of this feature is 
shown in Fig I where the dE/dx mean response of 
different particles is plotted as a function of momen- 
tum in the drift chamber of a typical High-Energy 
Physics experiment. Assuming that the resolution of 
the measurement is constant, the separation power 
dramatical changes in a short momentum range. As 
a consequence of the dependence of the mean value 
of the PID response on the particle momentum, the 
templates describing the PID variable's p.d.f. are 
not fixed but depend, on an event-by-event base, on 
the momentum of the particle: we clearly are in the 
situation described in 1 where the templates of the fit 
depend on a component of the fit itself. 
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Fig. 1 . The mean value of the energy loss of charged particles 
as a function of the momentum in a typical experiment. 

2.1. The Likelihood expression 

Consider, for simplicity, only the PID information 
provided by a dE/dx measurement. Our observables 
are then the dE/dx (pid) and the momentum of the 
track (mom). We will indicate as type the particular 
particle hypothesis. Unfortunately, we cannot simply 
write the Likelihood function as: 

L{fj) = Y[( fjP(pidi\momi,typej)). (1) 

i j=ir,K,P 

Using expression (1) may give a strongly biased re- 
sult if our additional variable, the momentum, has 
different distributions depending on the particle type 
(see next section). As discussed in 1 , whenever the 
templates used in a multi-component fit depend on 
additional observables, to avoid the bias it is nec- 
essary to use the correct, complete Likelihood ex- 
pression, including the explicit distributions of all 
observables for all classes of events. In our case, the 
above implies that we need to include in our Like- 
lihood the momentum distributions of each particle 
type. We should also notice that in practice those 
distributions are almost always different. 

We then write the correct Likelihood function as: 

L{fj) = Y[( fjP{pidi,momi\typej) (2) 

i j=ir,K,P 
= II ( 5Z fc P ( pi di\momi,typej) 

i j=TT,K,P 

x P (morrii | typej ) ) , 

with the condition: 

E fi = 1 - ( 3 ) 

j=Tt,K,P 

3. A toy study 

We generated a sample of different particle types 
with known composition as follow: 



• PID variable is distributed, for each particle, ac- 
cording to a typical resolution function (i.e. the 
template used in the fit) defined as: 

PID mcasulcd - PID c ^ pcctcd (mom) (4) 

Note the dependence on momentum of the ex- 
pected PID. 

It is important to note that we have chosen 
typical realistic values for all needed parameters. 

This distribution represents: 

P(pidi\mom,i, typej) (5) 

in Eq. (2). 

• Momenta of the particles are distributed according 
a Gaussian N(fj,j,(jj), where j — ir, K, P and: 

fa = 1.00, [i K = 1-25, lip = 1.25, 

ov = &k = <Jp = 0.50. 
Those distributions obviously represent: 

P(momi\typej) (6) 

of equation 2. 

• Particle fractions where fixed to: 

U = 50%, Ik = 35%, f P = 15%. 

We then used an unbinned Maximum Likelihood fit 
to estimate the particle fractions of the sample using 
the Likelihood function described in Eq. (2) where: 

P{momi\typej) = N(^j,aj). (7) 

In Fig. 2 (upper plot) the distribution of the estima- 
tors for f n and fp are shown for thirty toy samples of 
ten thousand particles each. As expected, the frac- 
tions returned by the fit are well centered on the true 
values given by the input. 

Conversely, the same distributions obtained with 
the incomplete Likelihood function of Eq. (1) (Fig. 
2, lower plot) are affected by a bias much larger than 
the nominal statistical uncertainty of those measure- 
ments, due to the difference in the momentum distri- 
bution of each particle type. This demonstrates that 
the effect predicted in 1 is actually very significant 
in real-life problems of Particle Identification. 
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Pion and Proton Fraction Estimators 
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Fig. 2. The Pion and Proton fraction estimator distributions 
when the complete (top) and incomplete (bottom) Likelihood 
expression is used. 



3.1. The case of unknown momentum 
distributions 

Writing the complete Likelihood function consider- 
ing the distribution of all the observables used in the 
fit is relatively straightforward in principle. 

On the other hand, in practice, we often have 
poor information about those distributions; some- 
times they are completely unknown. It is the case, 
for example, of the particle fractions produced during 
the fragmentation of heavy quarks where the corre- 
sponding momentum distributions are unknown and 
no functional hypothesis can be made. 

Considering what was shown in the previous sec- 
tion, we now wonder how to avoid the bias and write 
the complete Likelihood if the additional observable 
distributions are unknown. 

If no specific functional form can be assumed, 
we may want to use a general one, e.g. we could 
consider a Series Expansion as a description of the 
distributions with the expansion coefficients left as 
free parameters to be determined by the fit. 

We then write the momentum term of the Like- 
lihood function (2): 



P(momi,typej) = 



mom i ) 



(8) 



where m is the order and U m are the basis vectors 
used for the series expansion. 



Coming back to our toy sample, we considered 
Orthogonal Polynomials as a basis for the expansion. 
Amongst a number of possibilities, we selected Sec- 
ond Type Chebyshev Polynomials (denoted by U m ). 

We then replaced in expression (2) the term Eq. 
(7) with Eq. (8) and we performed again the un- 
binned Maximum Likelihood Fit, this time by fitting 
also the parameters of the polynomial expansion. As 
shown in Fig. 3, now the bias is brought back to 
zero, as it was when we assumed perfect knowledge of 
the individual momentum distributions of each par- 
ticle type. We have been able to avoid the bias in 
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Fig. 3. The Pion and Proton fraction estimator distributions 
using a Series Expansion as a parameterization of the momen- 
tum distribution. 



the fraction fit, without any particular assumption 
on the functional form of the momentum distribu- 
tions. In such a way we simulated the practical case 
where no information is known about the additional 
observable distributions. Please notice also that just 
the first seven terms of the Second Type Chebyshev 
Expansion were needed in order to parametrize each 
particle type momentum distribution. Another in- 
teresting aspect is that comparing Fig. 3 to Fig. 2 
no significant degradation in the resolution of the 
estimator is observed, although the number of pa- 
rameters is increased. In Fig. 4 the projections of 
the fit to the toy sample are shown. 

3.2. A more complicated case: Time of 
Flight 

Suppose that our PID information is obtained by 
the measurement of the Time of Flight. The ex- 
pression of the expected TOF is a function of two 
observables: 

L/c 



TOF c ^ pcctc d(mom, L) = 



*J 1 + (rrij /mom) 



(9) 
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Fig. 4. The momentum projections for each particle type su- 
perimposed on the corresponding generated distributions. 



Fig. 5. The Pion and Proton fraction estimator distributions 
using two Series Expansions as a parameterization of the mo- 
mentum and the arclength distributions. 



where L is the length travelled by the particle during 
its time measurement (arclength) and it is a func- 
ton of the production angle of the particle (in the 
cylindrical geometry of the TOF detector), c is the 
speed of light, rrij is the mass of the particle hypoth- 
esis j and mom is again the momentum. Both the 
momentum and the arclength distributions could be 
different for each particle type, i.e., both observables 
could be source of bias in the particle fractions esti- 
mation. Assuming no correlations between the mo- 
mentum and the arclength, we have to modify the 
expression (2) to be: 

L(fj) = JJ( ^ f j P(pid l mom il arc i \typej\lO) 

i j = TT,K,P 

= II ( ^2 fo P & d i\ mOTn ii tyP e 3 ) 

x P(momi\typej) 
xP(arc i \type J )). 

We then added the simulation of the arclength in 
our toy sample according to a normal distribution 
N((ij, <jj) using the values: 

M7r = 90, [ik — 100, \ip = 110, 

°V = &K = cp = 25. 

Considering again the case where no information 
is available about the distributions of each particle 
type, we used the same technique of the Series Ex- 
pansion for both variables. We repeated our fit on 
thirty toy samples and also in this case, as shown in 
Fig. 5, no bias was observed for our estimator. It 
is also interesting to observe that we used just three 
terms of the Chebyshev Expansion for the arclength 
parameterization, that results in an approximate de- 
scription of data (see arclength projections in Fig. 6) 
but it doesn't affect the results of the fit. 
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Fig. 6. The arclength projections for each particle type su- 
perimposed on the corresponding generated distributions. 

4. Conclusions 

In this short paper we focused on a practical and 
common problem of particle physics: the estimation 
of the particle type fractions using Particle Identifica- 
tion information. We showed that a significant bias 
can arise from the use of an incomplete expression 
of the Likelihood under realistic conditions. We also 
considered a practical problem where no information 
was assumed about an observable. We eliminated the 
bias by using Series Expansions of the unknown dis- 
tributions in orthogonal polynomials, where the co- 
efficients of the expansions are free parameters deter- 
mined by the fit. We also considered a more compli- 
cated example where two relevant observables have 
unknown distributions, and also in this case the Se- 
ries Expansion was successful in avoiding biases in 
determining the fractions of each component. 
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